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We introduce a minimalistic model based on dynamic node deletion and node duplication with 
heterodimerisation. The model is intended to capture the essential features of the evolution of 
protein interaction networks. We derive an exact two-step rate equation to describe the evolution 
of the degree distribution. We present results for the case of a fixed-size network. The results are 
based on the exact numerical solution to the rate equation which are consistent with Monte Carlo 
simulations of the model's dynamics. Power-law degree distributions with apparent exponents < 1 
were observed for generic parameter choices. However, a proper finite-size scaling analysis revealed 
that the actual critical exponent in such cases is equal to 1. We present a mean- field argument 
to determine the asymptotic value of the average degree, illustrating the existence of an attractive 
fixed point, and corroborate this result with numerical simulations of the first moment of the degree 
distribution as described by the two-step rate equation. Using the above results, we show that the 
apparent exponent is determined by the heterodimerisation probability. Our preliminary results 
are consistent with empirical data for a wide range of organisms, and we believe that through 
implementing some of the suggested modifications, the model could be well-suited to other types of 
biological and non-biological networks. 
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I. INTRODUCTION 

In recent years, models of networks growing 
via single-node duplication, divergence and mu- 
tation of links, considered in isolation and com- 
bination, have assumed prominence in the lit- 
erature on complex networks. In a series of in- 
dependent studies it was suggested that these 
duplication-divergence-mutation models (here- 
after called 'duplication models' for brevity) are 
good candidates to describe the evolution and 
large-scale topological features of real protein- 
protein interaction networks (PINs) in several 
organisms such as S.Cerevisiae and H. Pylori 
[1-6]. 

In duplication models, proteins are repre- 
sented by nodes, and a pairwise interaction be- 
tween any two proteins is represented by an 
undirected link between the associated nodes, 
assumed to be fully operative at all times. In 
a duplication event, a mother node is chosen 
uniformly at random (u.a.r.) and each of its 
links are copied to a newly created daughter 
node. Divergence refers to the subsequent loss 
of links from the daughter node [2, 4-7], and/or 
the mother node [8], or a shared neighbour of 
both the daughter and mother node [3]. Of- 
ten, for simplicity, it is assumed that only the 
daughter node diverges. In a mutation event, 
new links are added between the daughter node 
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and all other nodes in the network which are not 
already connected to the mother node. Typ- 
ically, one duplication-divergence event occurs 
at each update step, and mutations, when con- 
sidered, arc modelled at a rate much less than 
the divergence rate - typically, one new link per 
update step is added [4-6]. 

The idea of evolution through gene duplica- 
tion is taken from biology [9-12], where it was 
popularised in the 1970's by Ohno who conjec- 
tured that single and whole genome duplica- 
tions could provide the raw material for evo- 
lutionary diversification [13]. Whilst there is 
mounting evidence that duplicate genes do oc- 
cur in genomes [1, 12, 14-17], it is widely ac- 
knowledged that little is known about the de- 
tails of the process of duplication itself, such 
as the frequency of duplication events, the fate 
of duplicate genes, and the frequency with 
which duplicate genes acquire novel functions 
[1, 9, 11, 18]. In fact, whilst the microscopic pa- 
rameters are not yet known exactly, it has been 
suggested by Berg et al. [19] that the rate of 
gain and loss of interactions through mutations 
is at least an order of magnitude higher than 
the growth rate due to duplications; as a result, 
the link dynamics act as the dominant evolu- 
tionary force shaping the large-scale statistical 
structure of the network, and not the gene du- 
plications; the slower gene duplication events 
only really affect the size of the network. Thus, 
it is still unclear if, and to what extent, gene du- 
plication is the dominant mechanism responsi- 
ble for the observed statistical features of PINs. 
Moreover, gene deletion and rearrangement are 
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known to play important roles in the long-term 
evolution of genomes [10, 17, 20] but have re- 
ceived comparatively little attention in the lit- 
erature on networks addressing PINs [20, 21]. 

In this paper, in a move to go beyond the 
duplication models and to expand upon the 
emerging literature on network models of PINs 
including gene deletion, we present a four- 
parameter model addressing the scenario of net- 
work evolution through dynamic total node re- 
moval in conjunction with growth by node du- 
plication and heterodimerisation. We refer to 
our network model as a deletion- duplication- 
divergence-heterodimerisation (DDDH) model 
[22] . Although the model is primarily aimed at 
describing the evolution of PINs, its description 
is kept general enough so as to be applicable to 
a diverse range of complex systems where com- 
ponents are added and removed throughout the 
system's evolution. 

In the wider literature on complex networks, 
previous studies which have considered node 
deletion have generally regarded it as a pertur- 
bation effect, used to test the tolerance of a net- 
work to random and targeted attack [3, 25-27] . 
More recently, the mechanism of dynamic node 
removal in conjunction with growth by prefer- 
ential attachment has been explored in indepen- 
dent studies by Chung and Lu [28] , Cooper and 
Frieze [29], and Wang [30]. They refer to such 
models as growth- deletion models. Since we 
consider growth by duplication as opposed to 
growth by preferential attachment, our model 
therefore also contributes to the literature on 
growth-deletion models. 

We focus our analysis on the degree distri- 
bution, P(k), characterising the probability for 
a node to have exactly k links [31-33]. The 
degree distribution is the simplest topological 
feature to measure and as a result, it has at- 
tracted and received the most attention in the 
literature on complex networks. It has been 
shown [2-4, 6-8] that the degree distribution of 
networks generated by duplication models ex- 
hibits a power-law tail, P(k) ~ fc -7 , for k 3> 1, 
where the critical exponent 7 can be tuned such 
that it is in agreement with observed exponents 
which arc found to be in the range 1 < 7 < 3 
[2, 34]. Importantly, it has also been shown that 
the degree distribution is a robust and generic 
property of PINs common across different data 
sets - an important consideration given that 
current experimental techniques are notorious 
for suffering from a high rate of false positives 
and false negatives [19, 34]. 

The paper is organised as follows. In Sec. 
II, we present the formulation of the dynamics 
which describe the rules of evolution of the net- 



work, defining the parameters and interpreting 
the rates. In Sec. Ill, we present and discuss 
the exact two-step rate equation for the evo- 
lution of the degree distribution. In Sec. IV, 
we present results obtained from Monte Carlo 
(MC) simulations of the model and the exact 
numerical solution of the rate equation for a 
generic choice of parameter values. We discuss 
our results in Sec. V and end with a conclusion 
in Sec. VI. 



II. DDDH MODEL 

We consider undirected networks where loops 
and multiple links are forbidden. We start with 
a network of known size, N, and degree distri- 
bution, P(k, t = 0), and allow it to evolve under 
the following rules (see Fig. 1): 



DELETION 
• . • 

Pdel 



DUPLICATION 
t > ( 

Pdupl 



FIG. 1: (Colour online) A schematic representation 
of a network evolving through deletion and dupli- 
cation, (i) A node is chosen u.a.r. and deleted with 
probability pdci (grey) . (ii) A mother node is chosen 
u.a.r. and duplicated with probability pdupi (green 
— > red). The links are retained with probability p 
(dashed red line), and a further link is established 
between the daughter and mother node with prob- 
ability 6 (dotted blue line). 

1. Deletion. With probability pdel, a node is 
chosen u.a.r. This node and all of its links are 
deleted from the network. 

2. Duplication-Divergence-Heterodimerisation. 
With probability pdupl, a mother node is cho- 
sen u.a.r. and duplicated. This entails a new 
daughter node being added to the network and 
linking to each of the neighbours of the mother 
node with probability p. A further link is 
established between the daughter and mother 
node with heterodimerisation probability 9. 

The evolution of the network is thus governed 
by four parameters: Pdei,Pdu P i,P, 0. In Sec. Ill, 
we cast these rules into a concrete mathemati- 
cal framework. First, however, it is instructive 
to discuss the motivation behind the choice of 
dynamics and its implications. 

The mechanism of growth by duplication is 
preferred to the mechanism of growth by pref- 
erential attachment in PIN network models as 
well as many other network models since it re- 
produces the effects of preferential attachment 
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without having to artificially put the mecha- 
nism in. In other words, it arises naturally 
from the dynamics: nodes that have a large 
number of links are more likely to be neigh- 
bours of a duplicating node, and hence are more 
likely to gain a link to the newly created node. 
The DDDH model preserves this effect and in- 
troduces another one: implicit preferential de- 
tachment. Nodes that have a large number of 
links are more likely to be the neighbour of 
a node chosen for deletion, and therefore will 
be more likely to lose a link each time a node 
is deleted. It is interesting to note that these 
two effects do not cancel out each other, as one 
might intuitively expect. The inclusion of hct- 
crodimcrisation, 8 > 0, means that we do not 
consider mutations in the traditional sense (as 
described above), but rather we restrict the ad- 
dition of new links to only occur between the 
mother and the daughter node. Heterodimeri- 
sation is preferred over mutations as it has been 
noted that the former increases the likelihood 
of clique-formation - a feature observed in real 
PINs - whilst the latter, in order to form the 
observed number of triads and higher cliques, 
would require a prohibitively/physically unre- 
alistic high rate [8]. Moreover, in duplication 
models, whilst a lack of random linking (mu- 
tations) has been found to destroy fine struc- 
ture such as the self-averaging and existence of 
a smooth degree distribution [4] , the large-scale 
statistical features of the final network do not 
depend on the existence of mutations [3-5] . 

From the general definition of the model's 
dynamics given above, one can already deter- 
mine specific features of the network, namely 
its overall size, by focusing on particular choices 
of the model's parameter values. For exam- 
ple, for a network that remains, on average, 
fixed in size, pdd = Pdupi- We will refer to 
this case as a 'fixed-size' network. For a net- 
work that is monotonically growing, on aver- 



age, pdupi > Pdei; if Pdd = no nodes are 
removed from the network. Moreover, if we 
fix 6 = and < p < 1 and Pdupi = 1 f° r 
this case, the DDDH model is equivalent to 
the duplication-divergence model [5, 6, 8, 35]. 
For p d ci = 0,p dup i = 1,6 = and p = 1 
the duplication-divergence model is equivalent 
to Polya's Urn [2, 8, 36]. For an on average 
monotonically shrinking network, pdupi < PdeV, 
if Pdupi = no new nodes or links are added 
to the network [37]. For a network fluctuat- 
ing in size the values of pdei and pdupi would 
be stochastic variables chosen anew at each up- 
date step from a suitably chosen distribution 
[38]. If p — 1, the daughter node inherits all 
of the links; this is the case of perfect, or full 
duplication. If < p < 1, the daughter node 
inherits only some of the links; this represents 
imperfect, or partial duplication. If p = 0, the 
daughter node inherits none of the links, that 
is, an isolated node is added to the network. 

In this paper, we study the case of fixed-size 
networks, pdei = Pdupi = 1, with heterodimcri- 
sation 9 > and perfect duplication p = 1 , un- 
less otherwise stated (in which case, similar to 
Refs. [2, 4, 6] we assume that only the daughter 
node diverges) [40]. 

In Sec. Ill, we apply the rate equation ap- 
proach [4, 41] to study the evolution of the ex- 
pected number of nodes with k links at time t, 
f(k,t) = N t P(k,t) where N t is the total num- 
ber of nodes at time t. 



III. DDDH MODEL: RATE EQUATION 

In this section, we present a two-step rate 
equation for the general DDDH model, and 
then explain in detail the origin of each term. 

The two-step rate equation for the DDDH 
model is given by, 



m-^n t(u*\ /(M) kf(k,t) , (fc+l)/(fc+M) 

f(k,t+l) =f(k,t) -Pdel^r Pdcl +Pdc\ , (la) 

t(v ^ t(v +J _n fc/(M+l) J(k,t+1) 
f{k,t+2) =f(k,t + l) -pdu P \P t- t PdupiO — 77 

(k-l)f(k-l,t+l) n f(k-l,t+l) 

JV 4+ i iV t+ i 



,k-ir N, 

j>{k-l) 



t+1 



+f>du P i(i-0)£ (Qfti- P y-*l&££. (ib) 
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Equation (1) is exact and there are no 
approximations in its derivation. It de- 
scribes the DDDH process for all parameters, 
Pdebf?dupi,P> 6. It holds for all fc > 0, with 
f{-l,t) = and f(k > fc max ,i) = for all t. 
Moreover, f(k, t) satisfies the following normal- 
isation conditions: 

oo 

J2f(k,t)=N t , (2a) 

fc=0 

oo 

J2kf(k,t) = 2L t , (2b) 

fe=0 

where L t is the total number of links at time t 
in the network. 

The distinct feature of the DDDH rate equa- 
tion compared to rate equations for duplication 
models is that it is defined in two steps, and not 
one, reflecting the fact that we now include a 
node deletion in addition to node duplication. 
To keep the notation simple, we have written 
t + 1 and t + 2 in Eq. (1) but one might have 
equally well written t + 1/2 and t + 1 to in- 
dicate that we only observe the network after 
both the deletion and the duplication steps have 
been completed. 

Moreover, each of these actions is exe- 
cuted sequentially, highlighting the fact that 
we clearly also make the distinction between 
this process and the process of adding nodes at 
an 'effective' duplication rate, pdupi — Pdci, or 
merging nodes as is Refs. [23, 24], or substitut- 
ing nodes where the duplication of a node au- 
tomatically implies the deletion of some other 
node as in Ref. [20]. We note that the exact 
correspondence between time, t, and real bio- 
logical timescales is unclear, however, at this 
stage. 

In Eq. (la) we consider the effects on the net- 
work of the deletion of a node and the removal 
of its links. The probability a deletion event 
occurs is given by pdci- The terms on the right- 
hand side (RHS) are interpreted as follows. A 
loss in the number of nodes with degree k at 
time t + 1 from deletion will occur either if the 
node deleted is of degree k at time t, or if a 
neighbour (of arbitrary degree) of a fc-node is 
chosen for deletion, as the fc-node will lose a 
link and become a node of degree k — 1. Since 
every node has an equal probability of being 
deleted in a given time step, a node of degree 
k is chosen with probability f(k,t)/N t (second 
term); the probability that a neighbour of a fc- 
node is chosen for deletion is kf(k,t)/N t (third 
term). The final term on the RHS represents 
a gain in the number of nodes with degree fc 
at time t. This can occur if the neighbour of a 
node with degree fc + 1 is deleted. Given that a 



node of degree fc + 1 has fc + 1 neighbours, the 
probability a neighbour is chosen for deletion is 
(k + l)f(k + l,t)/N t . 

In Eq. (lb) we consider the effects of node 
duplication and subsequent heterodimerisation. 
The probability a duplication event occurs is 
given by pdupi, and the probability that subse- 
quent heterodimerisation occurs is given by 9. 
All but the two final terms on the RHS repre- 
sent changes a duplication-heterodimerisation 
event has on the existing nodes in the net- 
work; the last two terms represent the daughter 
node's contribution. 

A loss in the number of nodes with degree fc 
at time t+2 from duplication and heterodimeri- 
sation can occur in one of two ways. If one of 
the neighbours (of arbitrary degree) of a node 
of degree fc at time t + 1 is duplicated, the fc- 
node will, with probability p gain a link from 
duplication thus becoming a fc + 1 node at time 
t + 2 (second term). Alternatively, if the mother 
node to be duplicated is already of degree fc at 
time t + 1, it will become a node with degree 
fc+1 at time t+2 by gaining a link to the daugh- 
ter node via heterodimerisation. The probabil- 
ity a node of degree fc is chosen for duplication 
is given by /(fc, t+1) /N t +i, and the probability 
of heterodimerisation is 9 (third term). 

We arrive at the fourth and fifth terms which 
describe the gain in the number of nodes with 
degree fc at time t + 2 by similar considerations. 
If one of the neighbours of a node of degree 
fc — 1 at time t + 1 is chosen for duplication, 
the fc — 1 node will, with probability p, gain a 
link from duplication, thus becoming a fc node 
at time t + 2 (fourth term). Alternatively, if the 
mother node to be duplicated is of degree fc — 1 
at time t+1, it will become a node with degree 
fc at time t + 2 by gaining a link to the daughter 
node via heterodimerisation (fifth term). 

The final two terms on the RHS of Eq. (lb) 
account for the daughter node's contribution. 
The sixth term is to account for the contribu- 
tion of the daughter node in the event it does es- 
tablish a link, with probability 9, to the mother 
node. The seventh term is to account for the 
case where it docs not, which happens with 
probability (1 — 9). Note the lower limits on 
these sums are not identical. This is because if 
a link is established via heterodimerisation, in 
order to become a node of degree fc, the daugh- 
ter node is restricted to copying fc — 1 out of 
j links, each with probability p. However, if 
such a link is not established, the daughter node 
is restricted to copying fc out of j links of the 
mother node. 

Since the exact analytical solution to the rate 
equation in Eq. (1) is not tractable at present, 
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in Sec. IV we present results obtained from the 
numerical solution of the exact two-step rate 
equation and compare them with MC simula- 
tions of the model. Unless otherwise stated, 
our analysis is based on the case of a fixed-size 
network, pdoi = Pdupi = 1, evolving through 
perfect duplication, p = 1, with heterodimeri- 
sation, 9 > 0. 



IV. DDDH MODEL: RESULTS 

1. Comparison between exact numerical solution 
of the rate equation and MC simulations 

Figure 2 displays the evolution of the degree 
distribution obtained from the exact numeri- 
cal solution of the rate equation compared to 
MC simulations of the model. The stationary 
regime is defined by P(k, t) = P{k, t + 2). 

We start with an initial network of N = 400 
nodes, with a random degree distribution cen- 
tred around ki n a = 100, and iterate the rules 
with the following parameter settings: pdoi = 
Pdupi = 1 (fixed-size network), p = 1 (perfect 
duplication), with heterodimerisation 9 = 0.1. 
The value for 9 was chosen as such as it is be- 
lieved that heterodimerisation occurs at a rate 
not greater than 0.1 [8]. In Fig. 2, we show 
two snapshots of the network in the transient 
regime when t — 1000, 5000, respectively, and 
one in the stationary regime for t — 10 6 . The 
MC simulations are averaged over 10 5 realisa- 
tions for t = 1000, 5000 and 3 x 10 3 realisations 
for t = 10 6 . 

There is excellent agreement between the ex- 
act numerical solution and MC results, lending 
support to our statement earlier that there are 
no approximations involved in our derivation 
of the rate equation. We have verified through 
extensive simulations that the agreement holds 
true over a range of Pdd,Pdu P i,P, and 9 values. 
Hence, all remaining figures are generated using 
data obtained from the exact numerical solu- 
tion of the rate equation only, hereafter referred 
to as 'exact numerical results'. The interesting 
feature to note is that even for the simplest re- 
alisation of the DDDH model which we have 
presented in Fig. 2, fat-tailed degree distri- 
butions are obtained in the stationary regime 
(t = 10 6 curve). This is in stark contrast to 
the duplication models where only the case of 
imperfect duplication leads to a power law [2]. 
In the following section, we describe quantita- 
tively the exact form of the stationary degree 
distribution. 
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FIG. 2: Exact numerical (solid line) and MC (open 
circles) results of the degree distribution of a fixed- 
size network, pdei =f>du P i = 1, with iV = 400 nodes, 
evolving under perfect duplication, p=l, with het- 
erodimerisation 8 = 0.1. Snapshots were taken at 
times t = 1000, 5000 (transient regime) and t = 10 6 
(stationary regime). The MC simulations are aver- 
aged over 10 5 realisations for t = 1000, 5000, and 
3 x 10 3 realisations for t = 10 6 . The exact nu- 
merical results show excellent agreement with the 
MC simulations. The distribution in the stationary 
regime is well approximated by a power-law decay, 
P(k) ~ k" 1 with an apparent exponent 7 = 0.8 and 
sharp cut-off at fc C ut-off = 399. Note that the max- 
imum degree a node can attain in networks of the 
type we consider is N — 1. 



2. Scaling for p de i = Pdupi = p = 1,0 < < 1/2 

We are interested in quantifying the form 
of the degree distribution, in the stationary 
regime, as a function of the model's parameters. 
Given that we are, for the moment, investigat- 
ing fixed-size networks evolving under perfect 
duplication, PdebPdupi and p are all fixed. This 
reduces the number of variables to just one: 9. 
However, given that we are observing the de- 
gree distribution for specific fixed network sizes, 
we have N as another variable in the problem. 
Hence, we would like to know how P(k) de- 
pends on 9 and N. In order to investigate this, 
we have performed numerical simulations for 
the following two cases: (i) Fixed 9, varying 
N, and (ii) Fixed N, varying 9. We discuss (i) 
in this subsection, and (ii) in Sec. IV. 3. 

Figure 3(a) shows the exact numerical re- 
sults of the degree distribution, P(k; N) versus 
k on a double logarithmic plot in the station- 
ary regime for 9 = 0.1 and networks of increas- 
ing V, specifically, N = 50,100,200,400,1000 
nodes. There are clear power-law fluctuations 
in the node degrees present in the network, im- 
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FIG. 3: (a) Exact numerical results of the de- 
gree distribution in the stationary regime for fixed- 
size networks, pdei = Pdupi = 1, where N = 
50, 100, 200, 400, 1000 (marked with lines of increas- 
ing dash-length) each having evolved through per- 
fect duplication p = 1 with heterodimerisation 
6 = 0.1. There is no typical node-degree. For 
large node-degrees, the degree distribution is well- 
approximated by a power-law decay, P(k; N) ~ 
k 1 , with an apparent exponent 7 = 0.8. The 
power law is characterised by a sharp cut-off at 
fcmax = N — 1, which increases with increasing 
system size. Note that the maximum degree a 
node can attain in networks of the type we con- 
sider is N — 1. (b) Data collapse of the exact 
numerical results of the degree distribution is ob- 
tained by plotting the transformed probability den- 
sity kP(k;N) vs. the rescaled degree k/N. The 
curves collapse onto the graph of the scaling func- 
tion, Q(k/N) = r(1 _ 7) 1 r(1+7) (fc/^) 1 ' 7 (l - k/N)\ 
see Eq. (10). 



plying an appreciable probability of finding a 
node with degree, 1 < fc <C fc ma x in the network, 
fcmax marks the cross-over between a power-law 
decay and a rapid decay in P(k;N). In par- 
ticular, fc ma x represents a characteristic scale in 
the node degree resulting from the finite size of 
the networks we can study numerically. Hence, 
we can say that P(k; N) decays as a power law 
for 1 < k -C fc max and has a sharp cut-off for 
k ~3> fcmax, which can be expressed informally 



as, 

P(k, N) OC I snar p cu t-ofF, k » fcmax- ^ 

From simulations, we find that fc max = TV — 1 
which is equivalent to the maximum possible 
degree that a node in the network can ac- 
quire. This implies that fc ma x increases linearly 
with increasing network size, N, hence, in the 
limit of TV — > 00, the characteristic scale di- 
verges and a pure power law is recovered, as 
expected. In the region, 1 < k <C fc ma x, we find 
that the gradient of the lines in Fig. 3(a) are 
well-approximated by an 'apparent' exponent, 
7 = 0.8 and we will shortly demonstrate that 
7 = 1 — 28. Generally speaking, an exponent 
less than 1 is unusual, and seems to contradict 
certain known results about scaling functions. 
However, we will be able to resolve this appar- 
ent contradiction in Sec. IV. 3. 

With the above discussion in mind, we pro- 
pose the following general ansatz for P(k; N), 

P(k; N) = a(N)k-^g(k/N), (4) 

and, assuming for simplicity that fc ma x is ap- 
proximated by N, the equation is valid for 
1 < fc < N, where TV > 1 and 7 < 1. In 
Eq. (4), a(N) is a prefactor, dependent on 
the network size, and Q{x) is the cut-off func- 
tion, dependent on the rescaled variable k/N. 
Q(x) is required to fall-off fast enough to en- 
sure P(k; N) is finite and integrable. From Fig. 
3(a) it seems reasonable to assume that Q{x) 
is constant for x <C 1 and decays abruptly for 
x > 1. 

Note that Eq. (4) is not a finite-size scaling 
(FSS) ansatz since the prefactor, rather than 
being a constant (as is typical), is TV-dependent. 
This difference turns out to be important as it 
leads to an interesting result about the critical 
exponent which we will demonstrate below. 

In theory, we can use the fact that the prob- 
ability density function must be properly nor- 
malised, 

/OC 
P(k;N)dk = l, (5) 

to derive an expression for a(N). However, 
without knowing a priori the correct scaling for 
P(fc; N), we can only guess at the form of Q(x). 
For simplicity, we assume that the cut-off func- 
tion, Q(x), is of the form, 

f (1 - k/NV for 1 < fc < N 
y[x> ~ \ otherwise W 

(which is in-keeping with our stated require- 
ments for the form of Q(x) and is an excellent 
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fit to the numerics). Substituting this into Eq. 
(4) we find that normalisation requires, 



a{N)k-^{l-k/N)'<dk = l. (7) 

Evaluating the LHS of Eq. (7) we find for N > 
1, 

a(A0A rl - 7 r(l-7)r(l + 7) = 1, (8) 
and it immediately follows that, 



a(N) = 



r(i- 7 )r(i + 7)' 



(9) 



Using this result for a(N), we can recast Eq. (4) 
into a scaling ansatz such that an actual critical 
exponent equal to 1 is obtained, as follows, 

P ^ = r(i-^iV ~ W) 

= fc " V7)W+7) ^ (fc/A0 
= k-'Gik/N), (10) 

where Q{x) = r ( 1 ^ y ) 1 r (i +7 ) xl ~ 7 ^( x )- Equation 
(10) is our 'proper' FSS ansatz. Wc have shown 
using consistent arguments that Eq. (4) can be 
recast into Eq. (10) assuming the cut-off func- 
tion, Q{x) is of the form given in Eq. (6), and 
using the requirement that the probability den- 
sity function must be properly normalised to 
derive an expression for the iV-dependent pref- 
actor, a(N). The point of interest is that on the 
LHS of Eq. (10) the leading power-law term has 
attained a fixed value equal to 1, independent of 
7. Thus, even if the apparent measured expo- 
nent, 7 , is in the range [0, 1) the actual critical 
exponent is always equal to 1. 

To test the validity of the scaling ansatz given 
in Eq.(10), we have, in Fig. 3(b), plotted the 
transformed probability density kP{k;N) ver- 
sus the rescaled variable k/N using the same 
data as in Fig. 3(a). Multiplying both sides of 
Eq. (10) by k we get, 



kP(k;N) = g(k/N). 



(11) 



Therefore, we expect the curves to collapse onto 
the curve §(k/N) = r ^ r{1+y) (k/N)^(l - 

k/N) 1 , with the gradient of the slope to be 
equal to 1— 7. As shown in Fig. 3(b), a convinc- 
ing data collapse is obtained, with all curves 
collapsing onto the scaling function described 
by g(x) with 7 = 1 - 29 = 0.8. We have re- 
peated the data collapse for different values of 
9, and observed a convincing data collapse in all 



cases, with all curves collapsing onto the scaling 
function described by, Q{x) with 7 = 1 — 29. 

Together, Eq. (4) and the success of the data 
collapse in Fig. 3(b) demonstrate that the de- 
gree distribution for any network size, N, is de- 
termined by the scaling function Q{x). This 
means that we can deduce the degree distribu- 
tion for any network size, N, without having 
to actually perform the numerical simulation 
itself. Hence, our results are applicable to net- 
works larger than those which we have demon- 
strated directly, that is for N > 10 3 . This is 
in contrast to the duplication models consid- 
ered in [8] where the networks generated do not 
attain power-law degree distributions even for 
very large networks. 

Thus far, we have not yet justified the rela- 
tion given between the apparent exponent and 
the parameter 9. In the following section, wc 
derive a result for the average degree, (k). We 
then demonstrate how we can use this result to 
find an expression for the apparent exponent 7 
in terms of the parameter, 9. 



3. Mean-field Equation for the Average Degree 

The average degree, (k), can be determined 
in various different ways. Ideally, one would be 
able to calculate it directly from the two-step 
rate equation for the evolution of the degree dis- 
tribution in Eq. (1). This would be achieved by 
taking the first moment of the normalised de- 
gree distribution, P(k,t) = f(k,t)/N t , accord- 



ing to, 



(k)t 



1; 



kP(k,t)dk. 



(12) 



Strictly speaking, where we write integration 
signs we should have sums, as we are dealing 
with a discrete probability distribution. Ei- 
ther way, the solution we are interested in is 
lim t ^oo(k)t = {k), which is analytically in- 
tractable. In order to get around this prob- 
lem we have calculated this quantity numeri- 
cally, and compared this result to the asymp- 
totic value of (fc) obtained as a solution to a 
mean-field rate equation approach (described 
below) . 

Figure 4 illustrates the exact numerical re- 
sults for the time-evolution of (k) for 9 e 
[0.01,0.5] and clearly illustrates the existence 
of a stationary asymptotic (fc). Wc now de- 
scribe our mean-field argument to determine 
the asymptotic value of the average degree, (k). 
At each time step, for each node we delete we 
lose on average, (k) t links, and for each node 
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we duplicate we gain on average, p(k)t+i + 9 
links. Therefore, the net change in the num- 
ber of links, AL = -{k) t + p{k) t+ i + 9. For 
the case of p = 1, we can rewrite this as, 
AL = —^r- + 2 ,^ t+1 + 9, where we have used 
the standard relation, (k) = 2L/N. Imposing 
the condition AL = 0, which is valid in the 
stationary regime, t — ► oo, we find, 



(k) MF = e(N-i) 



(13) 



for a fixed-size network evolving under per- 
fect duplication for arbitrary 9. So, for a net- 
work of size TV = 200, for example, we pre- 
dict, using Eq. (13), (k) = 1.99,19.9,99.5 for 
9 = 0.01,0.1,0.5 respectively. We can compare 
this prediction with the exact numerical results 
for the first moment of the degree distribution. 
As shown in Fig. 4, there is exact agreement be- 
tween the asymptotic value of the average de- 
gree determined from the mean-field calculation 
and the exact numerical results (for t > 10 5 ). 
Thus, both the mean- field calculation and the 
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FIG. 4: Exact numerical results for the average de- 
gree, (k) , versus time for a fixed-size network, pdci = 
Pdu P i = l, with TV = 200 nodes, and fcinitiai = 50. The 
network evolved through perfect duplication and 
increasing 9 = 0.01,0.025,0.05,0.075,0.1,0.3,0.5 
(marked with lines of decreasing dash- length) . In 
each case, (k) reaches a stationary value whose 
value is identical to that predicted analytically (cir- 
cles), (fc) M F = 0(N - 1), see Eq. (13). 

exact numerical results, as illustrated in Fig. 4, 
demonstrate the existence of an attractive fixed 
point in the average degree, (k). This is clearly 
related to the existence of a stationary degree 
distribution, that is, P(k,t) = P(k,t + 2). 

Equation (13) highlights the importance of 
accounting for the mechanism of heterodimcri- 
sation for finite-sized networks, as it shows that 
the network self-organises to a stationary state 
where the average degree (k) is constant, and 
determined by the system size, TV and the prob- 
ability for heterodimerisation, 9. Since real 



PINs are of finite size, typically with no more 
than 10 4 nodes (see Table I), the point of in- 
formation regarding the role of 9 in finite-sized 
networks is of significance. 

We have also verified through further simu- 
lations varying p such that p < 1 for constant 
9 and TV clearly dramatically reduces (k), as 
one would expect, although the precise nature 
of this effect has not yet been quantified and 
seems to be non-trivial. Thus, Eq. (13) actu- 
ally gives an upper bound for (k). 

An alternative method for calculating (k) an- 
alytically, is to calculate the first moment of the 
degree distribution as expressed in Eq. (10), 
(fc)sF- This turns out to be very useful as far 
as determining an expression for the apparent 
exponent, 7, in terms of 9. We find that, 



(k) 



SF 



/N 
kP(k;N)dk 



1 



/ r(i-7)r(i+ 7 ) 
Tvr(2- 7 ) 



1-7 



2 r(i 



1) 



for TV — > 00 



(14) 



Since we already know that (fc) MF = 9{N — 1) 
from Eq. (13), the RHS of Eq. (14) must be 
equivalent to Eq. (13), hence, 



7 = 1 - 29. 



(15) 



This justifies our previous finding in Sec. IV. 2 
that the apparent exponent is 7 = 1 — 29. In 
the following section, we investigate the effect 
on the degree distribution of varying 9 £ (0, 0.5) 
for fixed TV, completing the analysis of the two 
scenarios outlined at the beginning of Sec. IV. 
2. 



4- Effects of varying 8 

Figure 5 illustrates the topological effect of 
varying the probability to heterodimerise in the 
range < 9 < 0.5, in a fixed-size network (TV = 
200), evolving through perfect duplication. 

We find that the degree distribution exhibits 
a power law with a slope that varies with 9 ac- 
cording to 7 = 1 — 29, reaching a uniform distri- 
bution at a value of = 0.5. We have repeated 
the above simulations for a range of network 
sizes, up to TV = 1000, and confirmed this be- 
haviour. Hence, the result for 7 is consistent 
with our previous findings in Sec. IV. 2. 

Current estimates from empirical data for 
yeast, fly and human PINs indicate that never 
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FIG. 5: Exact numerical results for the de- 
gree distribution in the stationary regime for a 
fixed-size network, pdei = Pdupi = 1, with N = 
200 nodes, and fcnitial = 50. The network 
evolved through perfect duplication and increasing 
6 = 0.01, 0.025, 0.05, 0.07, 0.1, 0.3, 0.5 (marked with 
lines of decreasing dash-length). The value 8 = 0.5 
marks a change in behaviour from a power-law de- 
cay with a negative exponent to a uniform distri- 
bution. 

exceeds 0.1 [8]. Since we observe power-law de- 
gree distributions in the range < 9 < 0.5, 
a value of 9 < 0.1 in our model is consistent 
with empirical data. The fact that 7 = 1 — 29 
might go some way towards explaining why in 
the duplication model considered in Ref. [8], 
for realistic values of 9 < 0.1 their results were 
not affected. 



5. Effects of varying p 

Up until now, we have been investigating the 
effects of varying 9 and N, for fixed p^d = 
Pdupi = P = 1) on the degree distribution. Wc 
now report our findings for a third possible sce- 
nario: the effect of varying p, for fixed 9 and N 
(keeping p dc i = p dup i = 1, as before). 

Figure 6 illustrates the topological effect of 
varying p in a fixed-size network with het- 
crodimcrisation 9 = 0.1. There is clearly a 
marked difference between the curve for p = 1 
and the family of curves for p < 1. We es- 
tablished in Sec. IV. 2-4 that for p = 1 and 
9 e (0,0.5), the degree distribution is well ap- 
proximated by a power-law decay. We now 
see that for p < 1, the power-law behaviour 
is no longer observed and a characteristic de- 
gree is present. We have confirmed this re- 
sult for a range of network sizes, specifically, 
N = 50,100,400,1000. Moreover, we have 
found that the second moment (fc 2 ) does not di- 
verge with increasing network size as one would 
expect if, in the limit of ./V — > 00, the degree dis- 



tribution were indeed described by a power-law 
decay. 
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FIG. 6: Exact numerical results for the degree 
distribution in the stationary regime for a fixed- 
size network, pd c i = Pdupi = 1, with N = 200 
nodes, for increasing duplication probabilities p = 
0.1, 0.2, 0.4, 0.6, 0.8, 0.9, 0.95, 0.975, 0.99, 1 (marked 
with lines of decreasing dash-length) and het- 
crodimerisation (9 = 0.1. There is a marked differ- 
ence between the curves generated with p < 1 and 
with p = 1. Whereas the latter is described by a 
power-law decay, the former are not and a charac- 
teristic degree is present. 

We can offer a simple heuristic argument 
to account for the difference between the case 
p = 1 and p < 1. We believe that it is di- 
rectly related to the choices we have made for 
the remaining parameters of the model, namely 
Pdci = Pdupi = 1 and 9 > 0. For p < 1 at each 
duplication event only some of the links of the 
mother node are copied by the daughter node, 
whereas for p = 1, all of the mother node's 
links are copied. Given that pdc\ — 1 we delete 
a node and all of its links at each time step, 
and thus at each duplication event, if p < 1, 
we do not compensate for the loss of links in- 
curred through the deletion process. Hence, the 
repeated application of duplication events with 
p < 1, given that pdci = 1 accounts for the fact 
that the observed degree distribution does not 
follow a power-law decay but rather has a char- 
acteristic degree present. 

6. Comment on relation to biological data 

We can test the suitability of the DDDH 
model as a representation of the evolution of 
PINs by comparing our results against empiri- 
cal data. Using data cited in Ref. [8] we can 
obtain values of the number of proteins in the 
network and estimates of the average degree of 
the PINs for yeast, fly, and human. From these, 
we can derive estimates for 9 using Eq. (13) and 
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the apparent scaling exponent, 7 = 1 — 29. The 
results are tabulated in Table I. 



Data 


N 


(k) 


e 


7 


Yeast (I) 


4873 


6.6 


0.0014 


1.0 


Yeast (II) 


5397 


29.2 


0.0054 


1.0 


Fly 


6954 


5.9 


0.00085 


1.0 


Human 


5275 


5.7 


0.0011 


1.0 



TABLE I: Comparison between empirical data [8, 
42] and the fixed-size, perfect duplication DDDH 
model. Values of 9 and 7 are quoted to 2 decimal 
places. 



For all data sets, the calculated value of 9 
is of the order of 10~ 3 which agrees well with 
the fact that it is believed heterodimerisation 
occurs at a rate not greater than 0.1 [8]. More- 
over, the corresponding apparent scaling ex- 
ponent extracted in all cases is found to be 
7 = 1.0. This is consistent with our FSS analy- 
sis and corroborates Rcf. [8] where a power law 
with degree exponent 7 « 1.1 is given for the 
data for yeast derived from [42] . 



V. DISCUSSION 

The results in Sec. IV indicate that the de- 
gree distribution of fixed-size networks where 
the rate of node deletion is equal to the rate of 
node duplication, pdci = Pdupi = 1, is depen- 
dent on several features. We summarise these 
as follows, (a) In order to observe a power-law 
degree distribution it is necessary to have p = 1 , 
that is, perfect duplication, and < 9 < 0.5. 

(b) No power-law is observed in the degree dis- 
tribution for p < 1, or for p = 1 and 9 > 0.5. 

(c) In all cases, it is necessary for 9 to be non- 
zero in order for a positive average degree to 
be obtained in the stationary regime since for 
9 = 0, (k) = 0. (d) In cases where a positive av- 
erage degree is obtained, it is notable that the 
network self-organises into a stationary state. 
We see this as an advantageous feature of this 
model and comment that this is in contrast to 
some other network models, such as in Rcf. [19] , 
where the average degree is a fixed parameter 
in the model, or the duplication models where 
the average degree scales with the network size, 
(e) Our FSS analysis indicates that for fixed 
9 E (0,0.5), the scaling exponent of the asso- 
ciated (proper) FSS ansatz is fixed and equal 
to 1. The new result is obtained through the 
inclusion of a system-size dependency in the 
prefactor, a(N), necessary when the apparent 
exponent 7 < 1, which results in the scaling 
function being recast in such a way that the 



only relevant scale parameterising the system 
is determined by the cut-off, given by N in our 
case. This example illustrates that it is neces- 
sary to be cautious when doing a FSS ansatz in 
systems where the apparent exponent appears 
to be less than 1 [43]. (f) Estimates of and 7 
for networks for yeast, fly and human are con- 
sistent with estimates from empirical data and 
our FSS analysis. 

If we accept the fact that the fixed-size ver- 
sion of the DDDH model in spite of its sim- 
plicity is able to reproduce certain observed 
topological features of PINs, this in turn would 
require us to revise the idea that the protein 
repertoire has evolved over millions of years 
from a small set of genes to the genomes we ob- 
served today in multi-cellular organisms which 
are typically composed of tens of thousands of 
genes since the two are not compatible. Clearly, 
this is a rather drastic measure. Rather than 
accept such a state of affairs, perhaps all that 
the results of the fixed-size DDDH model indi- 
cate thus far is that we should exercise caution 
when interpreting minimalistic network models, 
as attractive as they are. Since we can con- 
jure up many varied and simple network mod- 
els, with and without growth, which are capa- 
ble of reproducing observed features of complex 
systems perhaps the only recourse when trying 
to pick one network model over the other is to 
carefully use our knowledge of the essence of 
the original real system [44]. 



A. Extensions 

As a first step in investigating the behaviour 
of the DDDH model, it seems reasonable to 
keep things as simple as possible, as we have 
done here. We have reported on the case of 
fixed-size networks, and are currently investi- 
gating how the topology is affected by varying 
the relative rates of node deletion and duplica- 
tion. Moreover, sensitivity to initial conditions 
is being probed further; we believe that for the 
fixed-size case, the network features are inde- 
pendent of initial conditions. 

However, beyond the steps we have men- 
tioned, there are obvious extension of this 
model which further investigations could ben- 
efit from including. For example, the model is 
based on an undirected network - it would be 
interesting to see how best to incorporate dy- 
namics based on directed links and what affect 
this would have on the in-degree distribution 
and out-degree distribution. Incorporating this 
feature would make the model well-suited to de- 
scribing genetic regulatory networks, for exam- 
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pie. 

Moreover, the model assumes that the rate 
of node deletion and duplication are indepen- 
dent of one other, and independent of any fea- 
ture of the network such as the size; one could 
imagine the scenario where this is not the case. 
Moreover, we consider single-node deletion and 
single-node duplication - an interesting varia- 
tion would be to consider multiple-node dele- 
tion or duplication, or even duplication of whole 
modules (motifs) as in Rcf. [45], for example. 

Finally, the only cause of an increase or de- 
crease in the number of links is either due to a 
node deletion or node duplication event: links 
are not added or deleted through any other 
mechanism. The scenario where (directed) 
links are stochastically added or removed be- 
tween already existing nodes would be an inter- 
esting amendment to investigate, particularly 
with regards to the resulting effect on the de- 
gree distribution and its corresponding expo- 
nent [28, 30, 46]. 

A final example is that there is no fitness pa- 
rameter in the model, nor any rule based on se- 
lection - our results are independent of both of 
these features at the gene/protein level, and at 
the network level, yet it is widely believed that 
both features are driving forces in the evolution 
of most, if not all, biological systems. Includ- 
ing these features in a meaningful way would 
be a highly relevant step towards understand- 
ing some of the thornier questions in modern 
biology today. 

VI. CONCLUSIONS 

We have introduced and discussed a minimal- 
istic model governed by four parameters, based 



on dynamic node deletion and node duplication 
with heterodimerisation. The model is intended 
to capture some basic features in the evolution 
of protein interaction networks but we believe 
that it is also suited to other types of networks 
in light of the suggested modifications. 

Power-law degree distributions were observed 
for generic parameter values, and a novel finite- 
size scaling effect was observed for the case of 
fixed-size networks evolving through perfect du- 
plication and 9 e (0,0.5). The existence of an 
attractive fixed point in the average degree was 
derived based on mean- field arguments, and 
corroborated with numerical simulations of the 
first moment of the degree distribution as de- 
scribed by the two-step rate equation. The 
above results were then used to to derive a re- 
lation for the apparent exponent, 7 = 1 — 29. 

Our results thus far indicate consistency 
with empirical data. Further investigations are 
required to fully explore and understand the 
wider phase-space inhabited by this model, and 
several suggestions have been made to this end. 
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